Data¶

Note that all data is fictitious!

You can find more info about the data in the tables in the data dictionary.

  1. contacts: a transaction table containing Customer Service (CS) contacts.

  2. specialists: a dimension table containing information about CS specialists.

  3. users: a dimensions table containing information about N26 customers.

  4. surveys: a transaction table containing survey responses.

Task¶

Imagine you are an "Operations Data Analyst" and the Operations Extended Leadership Team has tasked you to do an analysis for them.

1. Customer Service Overview¶

a. You'll first want to give a high-level overview of what happened in Customer Service, based on the given data. Pick a set of (at least 3) KPIs you deem appropriate and visualise their development. Hint: In this sub-task, restrict yourself to contacts, specialists, surveys.

b. As we need to communicate to Workforce (WF) Management how much resources will be needed in the future, please determine how many members of staff will be needed per day of the week for the next 7 days after the last day in the dataset. You don't need to separate external and internal specialists as this will be done by the WF Manager.

2. CS Specialists.¶

Create weekly leaderboards: breakdown per week for the top 5 inhouse CS specialists with respect to 3 KPIs that you think are suitable for measuring specialist performance. Plot the leaderboards showing full names and scores.

Notes: Let's say a week starts on Monday and has 7 days. Let's only consider weeks that are entirely contained in the dataset.

Importing Neccessary Libraries¶

In [1]:
# standard imports
import pandas as pd
import plotly.express as px

Importing Datasets into Pandas Dataframe¶

In [2]:
# load data from each csv files and assign new DataFrame names to each
# read_csv with iso-8859-1 encoding

contacts_1 = pd.read_csv('ops_case_study__contacts.csv', encoding='iso-8859-1')
specialists_1 = pd.read_csv('ops_case_study__specialists.csv', encoding='iso-8859-1')
surveys_1 = pd.read_csv('ops_case_study__surveys.csv', encoding='iso-8859-1')
users_1 = pd.read_csv('ops_case_study__users.csv', encoding='iso-8859-1')

# We first copy each DFs to new DF names so that we can have a copy of the original imports when needed
# Assign new Data Frame names to each copied Data Frames
contacts = contacts_1.copy()
specialists = specialists_1.copy()
surveys = surveys_1.copy()
users = users_1.copy()
In [3]:
#View the shape for each DataFrames
print('contacts', contacts.shape)
print('specialists', specialists.shape)
print('surveys', surveys.shape)
print('users', users.shape)
contacts (75014, 16)
specialists (202, 4)
surveys (5032, 5)
users (42323, 8)
In [4]:
# Viewing first 5 rows
contacts.head()
Out[4]:
id cs_tag channel channel_type contact_language cst_initiated abandoned is_inhouse initiated_date contact_date product wait_time handle_time specialist_id user_id link_id
0 c53370 kyc_issue chat 1st level es True NaN True 2020-06-20 16:31:34.000000 NaN A NaN NaN s95 u41070 NaN
1 c53424 not_tagged chat 1st level fr True True True 2020-06-20 16:53:04.000000 NaN NaN 151.0 NaN s201 NaN NaN
2 c52674 not_tagged chat 1st level de True True True 2020-06-20 10:56:51.000000 NaN NaN 150.0 NaN s201 NaN NaN
3 c52689 not_tagged chat 1st level de True True True 2020-06-20 11:04:12.000000 NaN NaN 82.0 NaN s201 NaN NaN
4 c53214 not_tagged chat 1st level es True True True 2020-06-20 15:13:32.000000 NaN NaN 27.0 NaN s201 NaN NaN
In [5]:
# Viewing first 5 rows
specialists.head()
Out[5]:
id is_inhouse hired_at full_name
0 s32 False 2019-02-06 Taylor Swift
1 s64 False 2019-05-10 Jennifer Aniston
2 s96 True 2019-08-29 George Clooney
3 s128 True 2019-11-01 Robert Downey Jr.
4 s160 True 2019-12-17 Miley Cyrus
In [6]:
# Viewing first 5 rows
surveys.head()
Out[6]:
contact_id response_date agent_satisfaction csat issue_resolved
0 c53109 2020-06-21 08:02:09.000000 9 8 True
1 c52087 2020-06-21 01:15:08.000000 10 8 False
2 c52532 2020-06-21 18:33:11.000000 4 4 False
3 c53651 2020-06-21 11:35:20.000000 10 10 True
4 c52529 2020-06-21 06:37:53.000000 9 8 True
In [7]:
# Viewing first 5 rows
users.head()
Out[7]:
id signed_up_at has_activated_card closed_account_at tnc_country_group os first_name last_name
0 u15586 2019-02-28 False NaN ITA android Bruce Fuller
1 u38150 2020-05-28 False NaN DEU android Mitchell Coleman
2 u11771 2018-11-01 True NaN FRA iOS Julian Thornton
3 u94 2015-04-29 True NaN DEU iOS Stella Norton
4 u36971 2020-05-23 False 2020-06-09 FRA iOS Lois Lee

Combining Datasets¶

Merging Specialists and Contacts¶

In [8]:
# Merging Contacts to Specialists
combined2 = contacts.merge(specialists, how='inner', left_on= 'specialist_id', right_on='id')
combined2.shape
Out[8]:
(75014, 20)
In [9]:
# Viewing First 5 rows of combined dataset
combined2.head()
Out[9]:
id_x cs_tag channel channel_type contact_language cst_initiated abandoned is_inhouse_x initiated_date contact_date product wait_time handle_time specialist_id user_id link_id id_y is_inhouse_y hired_at full_name
0 c53370 kyc_issue chat 1st level es True NaN True 2020-06-20 16:31:34.000000 NaN A NaN NaN s95 u41070 NaN s95 True 2019-08-29 Kourtney Kardashian
1 c53707 not_tagged call callback NaN False False True 2020-06-20 20:17:15.000000 2020-06-20 20:17:16.000000 NaN 1.0 99.0 s95 NaN NaN s95 True 2019-08-29 Kourtney Kardashian
2 c53718 not_tagged call callback NaN False False True 2020-06-20 20:27:12.000000 2020-06-20 20:27:13.000000 NaN 1.0 181.0 s95 NaN NaN s95 True 2019-08-29 Kourtney Kardashian
3 c53369 not_tagged chat 1st level es True False True 2020-06-20 16:31:34.000000 2020-06-20 16:32:20.000000 A 46.0 687.0 s95 u41070 NaN s95 True 2019-08-29 Kourtney Kardashian
4 c53488 kyc_issue chat 1st level es True False True 2020-06-20 17:14:40.000000 2020-06-20 17:17:03.000000 NaN 143.0 440.0 s95 NaN NaN s95 True 2019-08-29 Kourtney Kardashian
In [10]:
# Dropping Duplicate Columns
combined2 = combined2.drop(['is_inhouse_y','id_y'], axis = 1)
In [11]:
# Viewing First 5 rows of the dataset
combined2.head()
Out[11]:
id_x cs_tag channel channel_type contact_language cst_initiated abandoned is_inhouse_x initiated_date contact_date product wait_time handle_time specialist_id user_id link_id hired_at full_name
0 c53370 kyc_issue chat 1st level es True NaN True 2020-06-20 16:31:34.000000 NaN A NaN NaN s95 u41070 NaN 2019-08-29 Kourtney Kardashian
1 c53707 not_tagged call callback NaN False False True 2020-06-20 20:17:15.000000 2020-06-20 20:17:16.000000 NaN 1.0 99.0 s95 NaN NaN 2019-08-29 Kourtney Kardashian
2 c53718 not_tagged call callback NaN False False True 2020-06-20 20:27:12.000000 2020-06-20 20:27:13.000000 NaN 1.0 181.0 s95 NaN NaN 2019-08-29 Kourtney Kardashian
3 c53369 not_tagged chat 1st level es True False True 2020-06-20 16:31:34.000000 2020-06-20 16:32:20.000000 A 46.0 687.0 s95 u41070 NaN 2019-08-29 Kourtney Kardashian
4 c53488 kyc_issue chat 1st level es True False True 2020-06-20 17:14:40.000000 2020-06-20 17:17:03.000000 NaN 143.0 440.0 s95 NaN NaN 2019-08-29 Kourtney Kardashian

Merging Combined2 Dataset with Surveys Dataset¶

In [12]:
# Merging datasets
combined3 = combined2.merge(surveys, how='outer', left_on= 'id_x', right_on='contact_id')
In [13]:
# Viewing rows and columns
combined3.shape
Out[13]:
(75014, 23)
In [14]:
# Viewing first 5 rows
combined3.head()
Out[14]:
id_x cs_tag channel channel_type contact_language cst_initiated abandoned is_inhouse_x initiated_date contact_date ... specialist_id user_id link_id hired_at full_name contact_id response_date agent_satisfaction csat issue_resolved
0 c53370 kyc_issue chat 1st level es True NaN True 2020-06-20 16:31:34.000000 NaN ... s95 u41070 NaN 2019-08-29 Kourtney Kardashian NaN NaN NaN NaN NaN
1 c53707 not_tagged call callback NaN False False True 2020-06-20 20:17:15.000000 2020-06-20 20:17:16.000000 ... s95 NaN NaN 2019-08-29 Kourtney Kardashian NaN NaN NaN NaN NaN
2 c53718 not_tagged call callback NaN False False True 2020-06-20 20:27:12.000000 2020-06-20 20:27:13.000000 ... s95 NaN NaN 2019-08-29 Kourtney Kardashian NaN NaN NaN NaN NaN
3 c53369 not_tagged chat 1st level es True False True 2020-06-20 16:31:34.000000 2020-06-20 16:32:20.000000 ... s95 u41070 NaN 2019-08-29 Kourtney Kardashian NaN NaN NaN NaN NaN
4 c53488 kyc_issue chat 1st level es True False True 2020-06-20 17:14:40.000000 2020-06-20 17:17:03.000000 ... s95 NaN NaN 2019-08-29 Kourtney Kardashian NaN NaN NaN NaN NaN

5 rows × 23 columns

In [15]:
# Dropping duplicate columns
combined3 = combined3.drop(['contact_id'], axis = 1)

Merging Combined3 with Users Dataset¶

In [16]:
# Merging
combined4 = combined3.merge(users, how='left', left_on= 'user_id', right_on='id')
In [17]:
# First 5 rows
combined4.head()
Out[17]:
id_x cs_tag channel channel_type contact_language cst_initiated abandoned is_inhouse_x initiated_date contact_date ... csat issue_resolved id signed_up_at has_activated_card closed_account_at tnc_country_group os first_name last_name
0 c53370 kyc_issue chat 1st level es True NaN True 2020-06-20 16:31:34.000000 NaN ... NaN NaN u41070 2020-06-10 False NaN ESP android Maude Hayes
1 c53707 not_tagged call callback NaN False False True 2020-06-20 20:17:15.000000 2020-06-20 20:17:16.000000 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 c53718 not_tagged call callback NaN False False True 2020-06-20 20:27:12.000000 2020-06-20 20:27:13.000000 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 c53369 not_tagged chat 1st level es True False True 2020-06-20 16:31:34.000000 2020-06-20 16:32:20.000000 ... NaN NaN u41070 2020-06-10 False NaN ESP android Maude Hayes
4 c53488 kyc_issue chat 1st level es True False True 2020-06-20 17:14:40.000000 2020-06-20 17:17:03.000000 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 30 columns

In [18]:
# Rows and Columns
combined4.shape
Out[18]:
(75014, 30)
In [19]:
# Dropping duplicate columns
combined4 = combined4.drop(['id'], axis = 1)
In [20]:
# Viewing first 5 rows of the dataset
combined4.head()
Out[20]:
id_x cs_tag channel channel_type contact_language cst_initiated abandoned is_inhouse_x initiated_date contact_date ... agent_satisfaction csat issue_resolved signed_up_at has_activated_card closed_account_at tnc_country_group os first_name last_name
0 c53370 kyc_issue chat 1st level es True NaN True 2020-06-20 16:31:34.000000 NaN ... NaN NaN NaN 2020-06-10 False NaN ESP android Maude Hayes
1 c53707 not_tagged call callback NaN False False True 2020-06-20 20:17:15.000000 2020-06-20 20:17:16.000000 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 c53718 not_tagged call callback NaN False False True 2020-06-20 20:27:12.000000 2020-06-20 20:27:13.000000 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 c53369 not_tagged chat 1st level es True False True 2020-06-20 16:31:34.000000 2020-06-20 16:32:20.000000 ... NaN NaN NaN 2020-06-10 False NaN ESP android Maude Hayes
4 c53488 kyc_issue chat 1st level es True False True 2020-06-20 17:14:40.000000 2020-06-20 17:17:03.000000 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 29 columns

In [21]:
# Converting to csv file
combined4.to_csv('Final_Merged.csv')
In [22]:
# Checking for duplicate data
check_duplicate = combined4['id_x'].duplicated().sum()       
check_duplicate
Out[22]:
0

Task 1A (Overview)¶

1. HeadCount¶

In [23]:
# Finding Number of Available Specialists
headcount = specialists['id'].dropna().count()
print('Total Number of Specialists =', headcount)
Total Number of Specialists = 202

2. Total Calls¶

In [24]:
# Finding number of calls attended by the specialists
totalcalls = combined2['handle_time'].dropna().count()
print('Total Calls Attended by Specialists =', totalcalls)
Total Calls Attended by Specialists = 66053

3. Avg Response Time¶

In [25]:
# Finding Avg Response time for Customers
avg_response_time = combined2['wait_time'].dropna().mean()
print('Average Response/Wait Time = %f minutes' %round(avg_response_time/60,2))
Average Response/Wait Time = 0.920000 minutes

4. Avg Call Time¶

In [26]:
# Finding avg time taken to handle Customer
avg_call_time = combined2['handle_time'].dropna().mean()
print('Average Call/Handle Time = %f minutes' %round(avg_call_time/60,2))
Average Call/Handle Time = 11.910000 minutes

5. Channel Wise¶

In [27]:
# Finding Customers dealt by specialists channelwise
Channelwise_Customers = combined3[combined3.channel.notnull()]
Channelwise_Customers = Channelwise_Customers.groupby('channel', as_index = False)['id_x'].count()
Channelwise_Customers
Out[27]:
channel id_x
0 call 7900
1 chat 67114
In [28]:
# Plotting the chart
fig = px.bar(Channelwise_Customers, x="channel", y="id_x", color="channel", title="Number of Customers Dealt by Specialists Channelwise")
fig.show()

6. Performance Wise¶

In [29]:
# Creating subdata set
top_performers = combined3[['specialist_id','full_name','csat']]
In [30]:
# Filtering the dataset
top_performers = top_performers[top_performers.csat.notnull()]
top_performers = top_performers.groupby('full_name',as_index=False)['csat'].mean()
In [31]:
# Sorting and Extract Best and Worst 5 Performers
top5_performers = top_performers.sort_values(by = 'csat', ascending = False).head()
worst5_performers = top_performers.sort_values(by = 'csat', ascending = True).head()
In [32]:
# Plotting the chart
fig = px.bar(top5_performers, x="full_name", y="csat", color="full_name", title="Top 5 Agents with Highest Average Rating")
fig.show()
In [33]:
# Plotting the chart
fig = px.bar(worst5_performers, x="full_name", y="csat", color="full_name", title="Top 5 Agents with Worst Average Rating")
fig.show()

7. Language Wise¶

In [34]:
# Filtering dataset
Languagewise_Customers = combined3[combined3.contact_language.notnull()]
Languagewise_Customers = Languagewise_Customers.groupby('contact_language', as_index = False)['id_x'].count()
Languagewise_Customers
Out[34]:
contact_language id_x
0 de 10664
1 en 17008
2 es 5428
3 fr 27882
4 it 7748
In [35]:
# Plotting the chart
fig = px.bar(Languagewise_Customers, x="contact_language", y="id_x", color="contact_language", title="Language of Customers Dealt by Specialists")
fig.show()

8. Country Wise¶

In [36]:
# Filtering the dataset
Country_Customers = combined4[combined4.tnc_country_group.notnull()] # Removing missing rows
Country_Customers = Country_Customers.groupby('tnc_country_group', as_index = False)['id_x'].count() # Counting Customers
Country_Customers
Out[36]:
tnc_country_group id_x
0 AUT 1303
1 DEU 13573
2 ESP 4050
3 FRA 24497
4 GBR 153
5 GrE 4864
6 ITA 7049
7 NEuro 868
In [37]:
# Plotting the chart
fig = px.bar(Country_Customers, x="tnc_country_group", y="id_x", color="tnc_country_group", title="Country of Customers Dealt by Specialists")
fig.show()

9. OS Wise¶

In [38]:
# Filtering the Dataset
OS_Customers = combined4[combined4.os.notnull()]
OS_Customers = OS_Customers.groupby('os', as_index = False)['id_x'].count()
OS_Customers
Out[38]:
os id_x
0 android 27568
1 iOS 28789
In [39]:
# Plotting the chart
fig = px.bar(OS_Customers, x="os", y="id_x", color="os", title="OS of Customers Dealt by Specialists")
fig.show()

10. Call Resolution¶

In [40]:
# Filtering the dataset
resolved = combined3[combined3.issue_resolved.notnull()]
resolved = resolved[resolved.issue_resolved == True]
resolved = resolved['issue_resolved'].count()
not_resolved = combined3[combined3.issue_resolved.notnull()]
not_resolved = not_resolved[not_resolved.issue_resolved == False]
not_resolved = not_resolved['issue_resolved'].count()
In [41]:
# Calculating Resolution Percentage
call_resolution = resolved/(resolved+not_resolved)*100
print("Call Resolution Percentage = %f percent" %call_resolution)
Call Resolution Percentage = 70.091415 percent

Task 1B (Predicting Number of Workforce)¶

In [42]:
# Number of Customers Calculation
customer_count = contacts['id'].dropna().nunique()
customer_count
Out[42]:
75014
In [43]:
# Number of Calls Per day Calculation
contacts['initiated_date'] = pd.to_datetime(contacts['initiated_date'])
contacts['intiated_day'] = contacts.initiated_date.dt.day
frequency_of_call = contacts['intiated_day'].mean()
frequency_of_call
Out[43]:
15.540965686405205
In [44]:
# Average call time in minutes
avg_call_time = avg_call_time/60
avg_call_time
Out[44]:
11.906010577364642
In [45]:
# Calculating Workforce Required perday assuming each specialist works 480 minutes a day
Wokforce = (customer_count*frequency_of_call*avg_call_time)/(480*headcount)
Workforce = round(Wokforce,0)
print('Workforce Required Per day of Week =',Workforce)
Workforce Required Per day of Week = 143.0
In [46]:
# Viewing Number of customers channelwise
Channelwise_Customers
Out[46]:
channel id_x
0 call 7900
1 chat 67114
In [47]:
# Calculating the percentage of the distribution
call_pecentage = 7900/customer_count*100
chat_percentage = 67114/customer_count*100
In [48]:
# Calculating estimated workforce required for each channel
Wokforce_chat = round((chat_percentage*Workforce)/100,0)
Wokforce_call = round((call_pecentage*Workforce)/100,0)
In [49]:
# Printing Chat Workforce Requirement
print('Chat Workforce Required =', Wokforce_chat)
Chat Workforce Required = 128.0
In [50]:
# Printing Call Workforce Requirement
print('Call Workforce Required =', round(Wokforce_call,0))
Call Workforce Required = 15.0

Task 2 (Creating Weekly Leaderboards According to KPI Performance)¶

In [51]:
# Creating a sub dataset for Task2
task2 = combined3[['response_date','specialist_id','full_name','is_inhouse_x','csat','issue_resolved','wait_time']]
In [52]:
task2 = task2[task2['is_inhouse_x'].notna()] # Removing missing values
task2 = task2[task2['is_inhouse_x'] == True] # Filtering rows where in_house is true
task2 = task2 = task2[task2['response_date'].notna()] # Removing rows where response_date is missing
task2 = task2[task2.issue_resolved.notnull()] # Filtering rows with missing rows
task2 = task2[task2.issue_resolved == True] # Filtering those rows where customer issue has been resolved
task2.reset_index(inplace = True) # Resetting the index
task2.head() # Viewing the first 5 rows
task2 = task2.groupby(['response_date','full_name'], as_index=False).mean() # Grouping the dataset using mean
task2['response_date'] = pd.to_datetime(task2['response_date']) # Converting column to datatime
task2['week'] = task2.response_date.dt.isocalendar().week # Extracting week number from the data
task2 = task2.groupby(['week','full_name'],as_index=False)[['csat','wait_time']].mean() # Grouping dataset by week and name
In [53]:
# Viewing First 5 Rows
task2.head()
Out[53]:
week full_name csat wait_time
0 23 Alec Baldwin 9.250000 42.00
1 23 Alyson Hannigan 9.250000 11.75
2 23 Amber Heard 9.000000 53.50
3 23 Ashton Kutcher 9.500000 9.50
4 23 Barack Obama 9.090909 99.00

Plotting Weekly Leadership Graphs based on Score, Wait_Time and Issue_Resolved¶

In [54]:
# Plotting week 23 Leaderboard
week23_leader = task2[task2.week == 23].sort_values(['csat', 'wait_time'], ascending=[False, True]).head()
fig = px.bar(week23_leader, x="full_name", y="csat", color="full_name", title="Top 5 Agents Week 23")
fig.show()
In [55]:
# Plotting week 24 Leaderboard
week24_leader = task2[task2.week == 24].sort_values(['csat', 'wait_time'], ascending=[False, True]).head()
fig = px.bar(week24_leader, x="full_name", y="csat", color="full_name", title="Top 5 Agents Week 24")
fig.show()
In [56]:
# Plotting week 25 Leaderboard
week25_leader = task2[task2.week == 25].sort_values(['csat', 'wait_time'], ascending=[False, True]).head()
fig = px.bar(week25_leader, x="full_name", y="csat", color="full_name", title="Top 5 Agents Week 25")
fig.show()
In [57]:
# Plotting week 26 Leaderboard
week26_leader = task2[task2.week == 26].sort_values(['csat', 'wait_time'], ascending=[False, True]).head()
fig = px.bar(week26_leader, x="full_name", y="csat", color="full_name", title="Top 5 Agents Week 26")
fig.show()
In [58]:
# Plotting week 27 Leaderboard
week27_leader = task2[task2.week == 27].sort_values(['csat', 'wait_time'], ascending=[False, True]).head()
fig = px.bar(week27_leader, x="full_name", y="csat", color="full_name", title="Top 5 Agents Week 27")
fig.show()
In [59]:
# Plotting week 28 Leaderboard
week28_leader = task2[task2.week == 28].sort_values(['csat', 'wait_time'], ascending=[False, True]).head()
fig = px.bar(week27_leader, x="full_name", y="csat", color="full_name", title="Top 5 Agents Week 28")
fig.show()